Goto

Collaborating Authors

 landmark point



Deep Structured Prediction for Facial Landmark Detection

Neural Information Processing Systems

Existing deep learning based facial landmark detection methods have achieved excellent performance. These methods, however, do not explicitly embed the structural dependencies among landmark points. They hence cannot preserve the geometric relationships between landmark points or generalize well to challenging conditions or unseen data. This paper proposes a method for deep structured facial landmark detection based on combining a deep Convolutional Network with a Conditional Random Field. We demonstrate its superior performance to existing state-of-the-art techniques in facial landmark detection, especially a better generalization ability on challenging datasets that include large pose and occlusion.


Refined Learning Bounds for Kernel and Approximate k -Means

Neural Information Processing Systems

Kernel $k$-means is one of the most popular approaches to clustering and its theoretical properties have been investigated for decades. However, the existing state-of-the-art risk bounds are of order $\mathcal{O}(k/\sqrt{n})$, which do not match with the stated lower bound $\Omega(\sqrt{k/n})$ in terms of $k$, where $k$ is the number of clusters and $n$ is the size of the training set. In this paper, we study the statistical properties of kernel $k$-means and Nystr\{o}m-based kernel $k$-means, and obtain optimal clustering risk bounds, which improve the existing risk bounds. Particularly, based on a refined upper bound of Rademacher complexity [21], we first derive an optimal risk bound of rate $\mathcal{O}(\sqrt{k/n})$ for empirical risk minimizer (ERM), and further extend it to general cases beyond ERM. Then, we analyze the statistical effect of computational approximations of Nystr\{o}m kernel $k$-means, and prove that it achieves the same statistical accuracy as the original kernel $k$-means considering only $\Omega(\sqrt{nk})$ Nystr\{o}m landmark points. We further relax the restriction of landmark points from $\Omega(\sqrt{nk})$ to $\Omega(\sqrt{n})$ under a mild condition.



Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

For [23] we refer to the paper pointed by you. 5. To Reviewer_25: the overall conceptual motivation for the paper is somewhat weak... Nystrom approximation can be used to approximate the kernel matrix and speed up kernel machines, and from Table 1 we can see that the performance is suboptimal even when rank=200 (see the 5-th column). In this case, it requires 200 inner product computations to make one prediction, which is too slow for many real-time systems (e.g., web applications, robotic applications ...). Therefore state-of-the-art Nystrom method is not good enough, and we reduce the prediction time to 10~20 inner products with a better classification accuracy, which is a big improvement. Also, as we mentioned in the point 1 above, although we want to optimize the prediction time, our method still has fast training time. We agree that the psuedo landmark point technique can be potentially applied to speed up the training time, and it is an interesting research direction.


S$^3$E: Self-Supervised State Estimation for Radar-Inertial System

Wang, Shengpeng, Xie, Yulong, Liao, Qing, Wang, Wei

arXiv.org Artificial Intelligence

Millimeter-wave radar for state estimation is gaining significant attention for its affordability and reliability in harsh conditions. Existing localization solutions typically rely on post-processed radar point clouds as landmark points. Nonetheless, the inherent sparsity of radar point clouds, ghost points from multi-path effects, and limited angle resolution in single-chirp radar severely degrade state estimation performance. To address these issues, we propose S$^3$E, a \textbf{S}elf-\textbf{S}upervised \textbf{S}tate \textbf{E}stimator that employs more richly informative radar signal spectra to bypass sparse points and fuses complementary inertial information to achieve accurate localization. S$^3$E fully explores the association between \textit{exteroceptive} radar and \textit{proprioceptive} inertial sensor to achieve complementary benefits. To deal with limited angle resolution, we introduce a novel cross-fusion technique that enhances spatial structure information by exploiting subtle rotational shift correlations across heterogeneous data. The experimental results demonstrate our method achieves robust and accurate performance without relying on localization ground truth supervision. To the best of our knowledge, this is the first attempt to achieve state estimation by fusing radar spectra and inertial data in a complementary self-supervised manner.


CMRINet: Joint Groupwise Registration and Segmentation for Cardiac Function Quantification from Cine-MRI

Elmahdy, Mohamed S., Staring, Marius, de Koning, Patrick J. H., Alabed, Samer, Salehi, Mahan, Alandejani, Faisal, Sharkey, Michael, Aldabbagh, Ziad, Swift, Andrew J., van der Geest, Rob J.

arXiv.org Artificial Intelligence

Accurate and efficient quantification of cardiac function is essential for the estimation of prognosis of cardiovascular diseases (CVDs). One of the most commonly used metrics for evaluating cardiac pumping performance is left ventricular ejection fraction (LVEF). However, LVEF can be affected by factors such as inter-observer variability and varying pre-load and after-load conditions, which can reduce its reproducibility. Additionally, cardiac dysfunction may not always manifest as alterations in LVEF, such as in heart failure and cardiotoxicity diseases. An alternative measure that can provide a relatively load-independent quantitative assessment of myocardial contractility is myocardial strain and strain rate. By using LVEF in combination with myocardial strain, it is possible to obtain a thorough description of cardiac function. Automated estimation of LVEF and other volumetric measures from cine-MRI sequences can be achieved through segmentation models, while strain calculation requires the estimation of tissue displacement between sequential frames, which can be accomplished using registration models. These tasks are often performed separately, potentially limiting the assessment of cardiac function. To address this issue, in this study we propose an end-to-end deep learning (DL) model that jointly estimates groupwise (GW) registration and segmentation for cardiac cine-MRI images. The proposed anatomically-guided Deep GW network was trained and validated on a large dataset of 4-chamber view cine-MRI image series of 374 subjects. A quantitative comparison with conventional GW registration using elastix and two DL-based methods showed that the proposed model improved performance and substantially reduced computation time.


Instance Segmentation for Point Sets

Talwar, Abhimanyu, Laasri, Julien

arXiv.org Artificial Intelligence

Recently proposed neural network architectures like PointNet [QSMG16] and PointNet++ [QYSG17] have made it possible to apply Deep Learning to 3D point sets. The feature representations of shapes learned by these two networks enabled training classifiers for Semantic Segmentation, and more recently for Instance Segmentation via the Similarity Group Proposal Network (SGPN) [WYHN17]. One area of improvement which has been highlighted by SGPN's authors, pertains to use of memory intensive similarity matrices which occupy memory quadratic in the number of points. In this report, we attempt to tackle this issue through use of two sampling based methods, which compute Instance Segmentation on a sub-sampled Point Set, and then extrapolate labels to the complete set using the nearest neigbhour approach. While both approaches perform equally well on large sub-samples, the random-based strategy gives the most improvements in terms of speed and memory usage.


Fast Prediction for Large-Scale Kernel Machines

Cho-Jui Hsieh, Si Si, Inderjit S. Dhillon

Neural Information Processing Systems

Kernel machines such as kernel SVM and kernel ridge regression usually construct high quality models; however, their use in real-world applications remains limited due to the high prediction cost.


Egoistic MDS-based Rigid Body Localization

Führling, Niclas, Abreu, Giuseppe, G., David González, Gonsa, Osvaldo

arXiv.org Artificial Intelligence

We consider a novel anchorless rigid body localization (RBL) suitable for application in autonomous driving (AD), in so far as the algorithm enables a rigid body to egoistically detect the location (relative translation) and orientation (relative rotation) of another body, without knowledge of the shape of the latter, based only on a set of measurements of the distances between sensors of one vehicle to the other. A key point of the proposed method is that the translation vector between the two-bodies is modeled using the double-centering operator from multidimensional scaling (MDS) theory, enabling the method to be used between rigid bodies regardless of their shapes, in contrast to conventional approaches which require both bodies to have the same shape. Simulation results illustrate the good performance of the proposed technique in terms of root mean square error (RMSE) of the estimates in different setups.